Search CORE

52 research outputs found

Homology-based inference sets the bar high for protein function prediction

Author: Ariane Boehm
Burkhard Rost
Cedric Landerer
Christian Schaefer
Denis Krompass
Dominik Achten
Esmeralda Vicedo
Florian Auer
Hamp Tobias
Manfred Roos
Mark Heron
Maximilian Hecht
Michael Kiening
Peter Hönigschmid
Rebecca Kassner
Stefan Seemayer
Stefanie Kaufmann
Tatjana Braun
Thomas A Hopf
Tobias Hamp
Yannick Mahlich
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Background: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. Methods: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. Results and conclusions: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA

OPUS Augsburg

Columbia University Academic Commons

Springer

Springer - Publisher Connector

PubMed Central

LocTree3 prediction of localization

Author: Alberts
Aleksandr Sorokoumov
Alexander Betz
Alice Meier
Altschul
Altschul
Bairoch
Berman
Briesemeister
Burkhard Rost
Dimmer
Goldberg
Guy Yachdav
Hamp
Hassan Nasir
Henrik Nielsen
Horton
Huh
Ilira Troshani
Imai
Jonas Reeb
Jonas Zierer
Julia Gerke
Kajan
Katharina Hembach
Kieu Trinh Do
Kinga Balasz
Koonin
Kuang
Laura Cizmadija
Lee
Maria Kalemanov
Max Herzog
Maximilian Hastreiter
Maximilian Hecht
Michael Bernhofer
Michael Kluge
Mika
Mooney
Nadeem Ahmed
Philipp Angerer
Przybylski
Radivojac
Robert Greil
Rost
Rost
Sander
Simpson
Sonja Ansorge
Sonja Waldraff
Susann Vorberg
Tatyana Goldberg
Timothy Karl
Tobias Hamp
Ulrich Neumaier
Uwe Altermann
Vadim Joerdens
Verena Prade
Yachdav
Yu
Yu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18 = 80 ± 3% for eukaryotes and a six-state accuracy Q6 = 89 ± 4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3

Crossref

PubMed Central

Online Research Database In Technology

The landscape of tolerated genetic variation in humans and primates

Author: Abee Christian
Adhikari Aashish
Agueda Lidia
Aguet Francois
Amaral João Valsecchi do
Andriaholinirina Nicole
Balick Daniel
Bataillon Thomas
Batzoglou Serafim
Beck Robin M. D.
Bergman Juraj
Bertuol Fabrício
Blanc Julie
Boubli Jean P.
Byrne Hazel
Chen Chen
Chuma Idriss S.
Dietrich Anastasia S. D.
Ede Jeffrey
Farh Kyle Kai-How
Farias Izeni
Fernandez-Duque Eduardo
Field Yair
Fiziev Petko P.
Frandsen Peter
Gao Hong
Goodhead Ian
Guschanski Katerina
Gut Ivo
Gut Marta
Hamp Tobias
Harris R. Alan
Horvath Julie E.
Hrbek Tomas
Hvilsom Christina
Janiak Mareike C.
Jensen Axel
Jolly Clifford J.
Juan David
Kanthaswamy Sree
Keyyu Julius D.
Khor Chiea Chuen
Kitchener Andrew C.
Knauf Sascha
Kuderna Lukas F. K.
Kuhlwilm Martin
Le Minh D.
Lee Jessica
Lek Monkol
Lemire Gabrielle
Lim Weng Khong
Lizano Esther
Manu Shivakumara
Marques-Bonet Tomas
McRae Jeremy
Melin Amanda
Melo Fabiano R. de
Merker Stefan
Messias Mariluce
Nadler Tilo
Navarro Arcadi
Orkin Joseph D.
O’Donnell-Luria Anne
Phillips-Conroy Jane
Rabarivola Clément J.
Raveendran Muthuswamy
Rehm Heidi L.
Reimers Rebecca
Rogers Jeffrey
Roos Christian
Rossi Rogerio
Rousselle Marjolaine
Sampaio Iracilda
Schierup Mikkel Heide
Schraiber Joshua G.
Shao Yong
Shiferaw Fekadu
Silva Felipe Ennes
Silva Maria N. F. da
Simmons Joe H.
Singer-Berk Moriel
Sundaram Laksshman
Sunyaev Shamil
Tan Patrick
Trivedi Mihir
Umapathy Govindhaswamy
Valenzuela Alejandro
Vries Dorien de
Wilkerson Gregory
Wu Dongdong
Wu Yibing
Xu Jinbo
Yang Yanshen
Zaramody Alphonse
Zhang Guojie
Zhou Long
Zinner Dietmar
Publication venue
Publication date: 02/06/2023
Field of study

Edinburgh Research Explorer

An expanded evaluation of protein function prediction methods shows an improvement in accuracy

Author: Almeida-e-Silva Danillo C.
Altenhoff Adrian
Babbitt Patricia C.
Bankapur Asma R.
Bargsten Joachim W.
Ben-Hur Asa
Benso Alfredo
Bhat Prajwal
Bkc Dukka
Bonneau Richard
Brenner Steven E.
Bryson Kevin
Cao Renzhi
Casadio Rita
Cejuela Juan M.
Chapman Samuel
Chen Ching-Tai
Cheng Jianlin
Cibrian-Uhalte Elena
Clark Wyatt T.
Cozzetto Domenico
D'Andrea Daniel
Das Sayoni
Dawson Natalie L.
del Pozo Angela
Denny Paul
Dessimoz Christophe
Di Carlo Stefano
Dogan Tunca
ElShal Sarah
Falda Marco
Fang Hai
Feng Shou
Fernández José M.
Ferrari Carlo
Fontana Paolo
Foulger Rebecca E.
Friedberg Iddo
Funk Christopher S.
Gabaldon Toni
Gemovic Branislava
Gillis Jesse
Ginter Filip
Giollo Manuel
Glisic Sanja
Goldberg Tatyana
Gong Qingtian
Gough Julian
Greene Casey S.
Hakala Kai
Hamp Tobias
Hieta Reija
Holm Liisa
Hsu Wen-Lian
Huntley Rachael P.
Jiang Yuxiang
Jones David T.
Kaewphan Suwisa
Kahanda Indika
Kansakar Lakesh
Khan Ishita K.
Kihara Daisuke
Koo Da Chen Emily
Koskinen Patrik
Lavezzo Enrico
Lee David
Lees Jonathan G.
Legge Duncan
Lepore Rosalba
Li Biao
Lin Alexandra
Linial Michal
Lovering Ruth C.
Magrane Michele
Maietta Paolo
Marcet-Houben Marina
Martelli Pier Luigi
Martin Maria J.
Mehryary Farrokh
Melidoni Anna N.
Mesiti Marco
Minneci Federico
Mooney Sean D.
Moreau Yves
Mutowo-Meullenet Prudence
Nepusz Tamás
Ning Wei
O'Donovan Claire
Oates Matt
Ofer Dan
Orengo Christine A.
Oron Tal Ronnen
Paccanaro Alberto
Pavlidis Paul
Penfold-Brown Duncan
Perovic Vladmir
Pichler Klemens
Piovesan Damiano
Politano Gianfranco
Profiti Giuseppe
Radivojac Predrag
Rappoport Nadav
Re Matteo
Rehman Hafeez Ur
Richter Lothar
Robinson Peter N.
Romero Alfonso E.
Rost Burkhard
Sahraeian Sayed M.E.
Salakoski Tapio
Salamov Asaf
Sasidharan Rajkumar
Savino Alessandro
Sedeño-Cortés Adriana E.
Sharan Malvika
Shasha Dennis
Shypitsyna Aleksandra
Sillitoe Ian
Skunca Nives
Smithers Ben
Stern Amos
Sternberg Michael J.E.
Supek Fran
Tian Weidong
Toppo Stefano
Tosatto Silvio C.E.
Tramontano Anna
Tranchevent Léon-Charles
Tress Michael L.
Törönen Petri
Valencia Alfonso
Valentini Giorgio
van Dijk Aalt D.J.
Veljkovic Nevena
Veljkovic Veljko
Vencio Ricardo ZN
Verspoor Karin M.
Vogel Jörg
Vucetic Slobodan
Wang Zheng
Wass Mark N.
Yang Haixuan
Youngs Noah
Zakeri Pooya
Zhang Shanshan
Zhong Zhaolong
Zhou Yuanpeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent. Keywords: Protein function prediction, Disease gene prioritizationpublishedVersio

Brage HiM

An Expanded Evaluation of Protein Function Prediction Methods Shows an Improvement In Accuracy

Author: Almeida-e-Silva Danillo C.
Altenhoff Adrian
Babbitt Patricia C.
Bankapur Asma R.
Bargsten Joachim W.
Ben-Hur Asa
Benso Alfredo
Bhat Prajwal
BKC Dukka
Bonneau Richard
Brenner Steven E.
Bryson Kevin
Cao Renzhi
Casadio Rita
Cejuela Juan M.
Chapan Samuel
Chen Ching-Tai
Cheng Jianlin
Cibrian-Uhalte Elenia
Clark Wyatt T.
Cozzetto Domenico
D\u27Andrea Daniel
Das Sayoni
Dawson Natalie L.
del Pozo Angela
Denny Paul
Dessimoz Christophe
Di Carlo Stefano
Dogan Tunca
ElShal Sarah
Falda Marco
Fang Hai
Feng Shou
Fernández José M.
Ferrari Carlo
Fontana Paolo
Foulger Rebecca E.
Friedberg Iddo
Funk Christopher S.
Gabaldon Toni
Gemovic Branislava
Gillis Jesse
Ginter Filip
Giollo Manuel
Glisic Sanja
Goldberg Tatyana
Gong Qingtian
Gough Julian
Greene Casey S.
Hakala Kai
Hamp Tobias
Hieta Reija
Holm Liisa
Hsu Wen-Lian
Huntley Rachael P.
Jiang Yuxiang
Jones David T.
Kaewphan Suwisa
Kahanda Indika
Kansakar Lakesh
Khan Ishita K.
Kihara Daisuke
Koo Da Chen Emily
Koskinen Patrik
Lavezzo Enrico
Lee David
Lees Jonathan G.
Legge Duncan
Lepore Rosalba
Li Biao
Lin Alexandra
Linial Michal
Lovering Ruth C.
Magrane Michele
Maietta Paolo
Marcet-Houben Marina
Martelli Pier Luigi
Martin Maria J.
Mehryar Farrokh
Melidoni Anna N.
Mesiti Marco
Minneci Federico
Mooney Sean D.
Moreau Yves
Mutowo-Meullenet Prudence
Nepusz Tamás
Ning Wei
O\u27Donovan Claire
Oates Matt
Ofer Dan
Orengo Christine A.
Oron Tal Ronnen
Paccanaro Alberto
Pavlidis Paul
Penfold-Brown Duncan
Perovic Vladmir
Pichler Klemens
Piovesan Damiano
Politano Gianfranco
Profiti Giuseppe
Radivojac Predrag
Rappoport Nadav
Re Matteo
Rehman Hafeez Ur
Richter Lothar
Robinson Peter N.
Romero Alfonso E.
Rost Burkhard
Sahraeian Sayed M.E.
Salakoski Tapio
Salamov Asaf
Sasidharan Rajkumar
Savino Alessandro
Sedeño-Cortés Adriana E.
Sharan Malvika
Shasha Dennis
Shypitsyna Aleksandra
Skunca Nives
Smithers Ben
Stern Amos
Sternberg Michael J.E.
Stilltoe Ian
Supek Fran
Tian Weidong
Toppo Stefano
Tosatto Silvio C.E.
Tramontano Anna
Tranchevent Léon-Charles
Tress Michael L.
Törönen Petri
Valencia Alfonso
Valentini Giorgio
van Dijk Aalt D.J.
Veljkovic Nevena
Veljkovic Veljko
Vencio Ricardo Z.N.
Verspoor Karin M.
Vogel Jörg
Vucetic Slobodan
Wang Zheng
Wass Mark N.
Yang Haixuan
Youngs Noah
Zakeri Pooya
Zhang Shanshan
Zhong Zhaolong
Zhou Yuanpeng
Publication venue: The Aquila Digital Community
Publication date: 07/09/2016
Field of study

Aquila Digital Community

The landscape of tolerated genetic variation in humans and primates.

Author: Abee Christian
Adhikari Aashish
Agueda Lidia
Aguet Francois
Andriaholinirina Nicole
Balick Daniel
Bataillon Thomas
Batzoglou Serafim
Beck Robin M D
Bergman Juraj
Bertuol Fabrício
Blanc Julie
Boubli Jean P
Byrne Hazel
Chen Chen
Chuma Idriss S
da Silva Maria N F
de Melo Fabiano R
de Vries Dorien
Dietrich Anastasia S D
do Amaral João Valsecchi
Ede Jeffrey
Farh Kyle Kai-How
Farias Izeni
Fernandez-Duque Eduardo
Field Yair
Fiziev Petko P
Frandsen Peter
Gao Hong
Goodhead Ian
Guschanski Katerina
Gut Ivo
Gut Marta
Hamp Tobias
Harris R Alan
Horvath Julie E
Hrbek Tomas
Hvilsom Christina
Janiak Mareike C
Jensen Axel
Jolly Clifford J
Jolly Clifford J.
Juan David
Kanthaswamy Sree
Keyyu Julius D
Khor Chiea Chuen
Kitchener Andrew C
Knauf Sascha
Kuderna Lukas F K
Kuhlwilm Martin
Le Minh D
Lee Jessica
Lek Monkol
Lemire Gabrielle
Lim Weng Khong
Lizano Esther
Manu Shivakumara
Marques-Bonet Tomas
McRae Jeremy
Melin Amanda
Merker Stefan
Messias Mariluce
Nadler Tilo
Navarro Arcadi
O'Donnell-Luria Anne
Orkin Joseph D
Phillips-Conroy Jane
Rabarivola Clément J
Rabarivola Clément J.
Raveendran Muthuswamy
Rehm Heidi L
Reimers Rebecca
Rogers Jeffrey
Roos Christian
Rossi Rogerio
Rousselle Marjolaine
Sampaio Iracilda
Schierup Mikkel Heide
Schraiber Joshua G
Schraiber Joshua G.
Shao Yong
Shiferaw Fekadu
Silva Felipe Ennes
Simmons Joe H
Singer-Berk Moriel
Sundaram Laksshman
Sunyaev Shamil
Tan Patrick
Trivedi Mihir
Umapathy Govindhaswamy
Valenzuela Alejandro
Wilkerson Gregory
Wu Dongdong
Wu Yibing
Xu Jinbo
Yang Yanshen
Zaramody Alphonse
Zhang Guojie
Zhou Long
Zinner Dietmar
Publication venue: American Association for the Advancement of Science
Publication date: 02/06/2023
Field of study

Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases

University of Salford Institutional Repository

Alternative Protein-Protein Interfaces Are Frequent Exceptions

Author: Burkhard Rost (6137)
Tobias Hamp (111320)
Publication venue
Publication date: 01/01/2012
Field of study

<div><p>The intricate molecular details of protein-protein interactions (PPIs) are crucial for function. Therefore, measuring the same interacting protein pair again, we expect the same result. This work measured the similarity in the molecular details of interaction for the same and for homologous protein pairs between different experiments. All scores analyzed suggested that different experiments often find exceptions in the interfaces of similar PPIs: up to 22% of all comparisons revealed some differences even for sequence-identical pairs of proteins. The corresponding number for pairs of close homologs reached 68%. Conversely, the interfaces differed entirely for 12–29% of all comparisons. All these estimates were calculated after redundancy reduction. The magnitude of interface differences ranged from subtle to the extreme, as illustrated by a few examples. An extreme case was a change of the interacting domains between two observations of the same biological interaction. One reason for different interfaces was the number of copies of an interaction in the same complex: the probability of observing alternative binding modes increases with the number of copies. Even after removing the special cases with alternative hetero-interfaces to the same homomer, a substantial variability remained. Our results strongly support the surprising notion that there are many alternative solutions to make the intricate molecular details of PPIs crucial for function.</p> </div

Directory of Open Access Journals

PubMed Central

FigShare

Accelerating the Original Profile Kernel.

Author: Burkhard Rost
Tatyana Goldberg
Tobias Hamp
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel

CiteSeerX

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Three typical interactions exhibiting surprising variety.

Author: Burkhard Rost (6137)
Tobias Hamp (111320)
Publication venue
Publication date
Field of study

<p>(<b>A</b>) Protein ‘ras’ binds to ‘son of sevenless’ (1NVV): alternative binding for sequence-identical pairs of proteins and without a multimeric context; the lower left panel shows the residues of the two interfaces in purple and red. (<b>B</b>) Natural dimeric interactions between proteins from the protein kinase and cyclin families (interface copy number 1; e.g. 1OI9). Cyclin chains (green) have been structurally aligned and superimposed. Protein kinases (cyan and blue) were subject to the same geometric translations. The blue chain has a recently discovered outlier interface (see text). (<b>C</b>) Superimposition of entire sequence-identical F1-ATPase complexes. Complexes were aligned and superimposed with the gamma chains (green). Alpha (orange) and beta (cyan) subunits were subject to the same geometric translations. In the main panel, we look at the complexes from the top. The inlet displays an interaction between a beta and a gamma subunit from the side.</p

FigShare